AD-A258  775 


MODELING  EXPERT  OPINION:  LIKELIHOODS  UNDER 
INCOMPLETE  PROBABILISTIC  SPECIFICATION 


C.\i, 


by 

Alan  E.  Gelfand 
Ban!  K.  Mallick 
Dipak  K.  Dey 

TECHNICAL  REPORT  No.  464 
DECEMBER  9,  1992 

Prepared  Under  Contract 
N00014-92-J-1264  ((NR-042-267) 

FOR  THE  OFFICE  OF  NAVAL  RESEARCH 


Reproduction  in  whole  or  in  part  is  permitted 
for  any  purpose  of  the  United  States  Government, 


Approved  for  public  release;  distribution  unlimited 


DEPARTMENT  OF  STATISTICS 
STANFORD  UNIVERSITY 
STANFORD,  CALIFORNIA  94305-4065 


MODELING  EXPERT  OPINION:  LIKELIHOODS  UNDER 
INCOMPLETE  PROBABILISTIC  SPECIFICATION 


by 

Alan  E.  Gelfand 
Bani  K.  Mallick 
Dipak  K.  Dey 

TECHNICAL  REPORT  No.  464 
DECEMBER  9,  1992 

Prepared  Under  Contract 
N00014-92-J-1264  ((NR-042-267) 

FOR  THE  OFFICE  OF  NAVAL  RESEARCH 

Professor  Herbert  Solomon,  Project  Director 

Reproduction  in  whole  or  in  part  is  permitted 
for  any  purpose  of  the  United  States  Government. 


Accesiof,  For  | 

NT'S 

CRA&I 

DTIC 

*  A3 

G 

Ui;a  ii 

OiJ  Oir'J 

"'i 

I  Dist.  ib 

'Jtion  / 

Availab:':' 

Dist 

AvS'f  i 
Sp'. 

0ic)i 

Approved  for  public  release;  distribution  unlimited 


DEPARTMENT  OF  STATISTICS 
STANFORD  UNIVERSITY 
STANFORD,  CALIFORNIA  94305-4065 


Modeling  Expert  Opinion:  Likelihoods  Under 
Incomplete  ProbabiHstic  Specification 


by 

Alan  £.  Gelfand 
Bani  K.  Mallick 
and 

Dipak  K.  Dey* 
University  of  Connecticut 


Abstract 

Expert  opinion  is  often  sought  with  regard  to  unknowns  in  a  decision— making 
setting.  Our  presumption  is  that  such  opinion  is  elicited  as  an  incomplete  probabilistic 
specification  either  in  the  form  of  probability  assignments  to  fixed  intervals  or  in  the  form 
of  selected  quantiles.  We  present  likelihoods  for  such  specification  which  arise  through 
random  mixtures  of  Beta  distributions.  We  presume  that  a  supra  Bayesian  presides  over 
the  opinion  collection  resulting  in  the  posterior  distribution  as  the  mechanism  for  pooling 
opinion.  The  models  are  applied  to  opinion  collected  regarding  points  per  game  for 
participants  in  the  1991  NBA  championship  basketball  series. 


2 


1.  Intiodaction 

Expert  opinion  is  often  sought  with  regard  to  unknowns  in  a  decision— making 
setting  in  order  to  improve  the  quality  of  the  decision  making  process.  We  presume  that 
this  opinion  regards  a  univariate  unknown  denoted  by  0,  that  it  may  be  collected  from 
several  experts  and  that  it  is  probabilistic  in  nature. 

In  this  context,  to  date,  most  work,  assumes  that,  at  the  individual  level,  an  expert 
fully  supplies  a  probability  measure  for  0.  The  literature  on  elicitation  of  a  probability 
measure  is,  by  now,  substantial.  See  Kahneman,  Slovic  and  Tversky  (1982)  for  a  readable 
review  and  Kadane  et.  al.  (1980)  for  implementation  suggestions.  A  naive  approach  is  to 
insist  that  the  individual  probability  measures  for  0  are  members  of  a  standard  parametric 
family  whence  the  expert  need  only  supply  its  parameters.  This  seems  far  too  restrictive. 
In  fact  full  specification  of  a  distribution  for  a  continuous  parameter  0  seems  to  be  more 
than  we  can  reliably  expect  even  an  expert  to  provide.  Rather  we  assume  that  each  expert 
expresses  belief  regarding  the  unknown  ^  in  a  partial  or  incomplete  way.  That  is,  either 
probabilities  are  provided  for  a  small  collection  of  disjoint  exhaustive  intervals  in  the 
domain  of  ^  or  a  small  set  of  quantiles  for  the  distribution  of  0  are  provided. 

One  might  seek  incomplete  specification  in  terms  of  the  first  few  moments  of  the 
distribution  of  0  (Genest  and  Schervish,  1985)  but  moments  seem  less  intuitive  than 
probabilities  or  quantiles  hence  less  reliably  collected. 

It  is  noteworthy  that  we  have  traded  one  modeling  problem  for  another  in  that  we 
now  need  to  develop  appropriate  probabilistic  models  for  the  randomly  collected 
incomplete  expert  opinion  itself.  This  problem  is  the  subject  of  our  work  and  we  address  it 
primarily  at  the  individual  level.  Our  work  builds  upon  that  of  Lindley  (1985)  and  West 
(1988).  Modeling  the  joint  distribution  of  opinions  is,  in  principle,  no  harder  as  we  show. 
In  our  illustrative  example  we  do  have  the  simplification  of  independently  collected 
opinion. 

We  adopt  the  supra  Bayesian  approach  as  our  mechanism  for  combining  expert 


3 


opinions.  The  literature  on  normative  approaches  for  the  formation  of  aggregate  opinion  is 
substantial.  See,  for  instance,  the  survey  articles  of  French  (1985),  Genest  and  Zidek 
(1986)  and  Chatterjee  and  Chatter jee  (1987).  Many  of  the  papers  in  this  area  presume  an 
axiomatic  specification  of  a  set  of  properties  which  the  aggregation  mechanism  must  obey 
and  then  deduce  the  class  of  pooling  functions  meeting  these  properties.  We  do  not  engage 
in  philosophical  debate  regarding  selection  of  a  pooling  recipe.  We  are  drawn  to  the  supra 
Bayesian  stance  in  that  it  adopts  Bayes’  rule  as  the  pooling  operator  naturally  producing 
combined  opinion  as  the  posterior  distribution.  Often  there  is  an  implicit  external 
decision-maker  who  has  his/her  prior  opinions,  who  gathers  the  experts’  opinions  and  who 
is  capable  of  calibrating  the  relative  quality  of  each  expert’s  opinion.  We  seek  to  help  this 
decision  maker  specify  a  likelihood  for  this  collection  of  opinions  after  which  the  Bayesian 
paradigm  would  enable  the  desired  pooling. 

We  recognize  that  many  decision  making  situations  would  not  be  of  this  type 
making  required  supra  Bayesian  specification  difficult  and  arguing  against  its  use.  The 
supra  Bayesian  approach  has  its  roots  in  Winkler  (1968).  The  name  was  coined  by  Keeney 
and  Raiffa  (1976).  Genest  and  Zidek  (1986)  provide  insightful  additional  discussion. 

Bindley  (1985)  and  West  (1988)  articulate  very  clearly  the  key  issues  in  the 
likelihood  specification.  In  particular,  specification  of  probabilities  of  disjoint  exhaustive 
sets  results  in  a  discrete  distribution  for  6.  Bindley  essentially  takes  a  multivariate  normal 
distribution  for  the  logits  of  these  probabilities.  If  the  supra  Bayesian  specifies  a  prior  for  6 
which  is  also  a  discrete  distribution  over  these  sets,  then  Bindley  observes  that  the 
posterior  or  combined  opinion  updates  these  prior  probabilities  in  a  linear  fashion  on  the 
log  scale.  In  the  case  of  specification  of  quantiles,  under  a  continuous  distribution  for  0, 
West  develops  a  likelihood  emanating  from  a  Dirichlet  process  which  implicitly  determines 
the  distribution  for  0.  Our  contribution  provides  alternative,  richer  and  thus  possibly 
preferable,  classes  of  likelihoods  for  opinions  of  each  of  the  above  types.  Our  starting  point 
is  the  family  of  mixtures  of  Beta  distributions,  a  dense  collection  of  integrable  functions  on 


4 


[0,1]  (see  e.g.,  Diaconis  and  Ylvisaker,  1985).  In  section  2  we  formalize  our  problem 
developing  appropriate  notation.  In  section  3  we  consider  likelihoods  under  discretized 
distributions  for  6.  In  section  4  we  consider  likelihoods  under  quantile  specification  for  6. 
In  both  cases  so-called  missing  data  or  marginal  likelihoods  emerge.  In  section  5  we 
describe  extension  to  multiple  experts.  In  Section  6  we  show  that  computation  required  to 
develop  posteriors  from  such  likelihoods  can  be  handled  using  the  Gibbs  sampler  (see,  e.g., 
Gelfar  i  and  Smith,  1990).  In  section  7  we  offer  an  example  involving  opinions  on  team 
point  totals  in  professional  basketball  games.  In  particular  several  experts  were  asked  to 
supply  information  of  the  above  sort  with  regard  to  the  distribution  of  points  per  game 
they  anticipated  for  the  participants  in  the  1991  NBA  championship  basketball  series.  We 
present  a  synthesis  of  this  opinion.  The  key  features  of  our  approach  are  summarized  in 
section  8. 

2.  Notation  and  Preliminaries 

Let  us  assume  that  6  is  univariate  and  that  its  domain,  0,  is  an  interval  in  R^. 
Consider  a  partition  of  8  into  k  fixed  sets  determined  by  the  points  aQ<aj<‘  •  ■<ajj  where 
aQ  =  inf{^  e  0},  aj^  =  sup{^  €0}  and  let  Ij  =  (aj_p  aj).  An  expert  supplies  a  vector  p  = 
(Pp'  •  where  p.  is  the  expert’s  opinion  regarding  the  chance  that  ^  e  I..  A  collection 
of  N  expert  opinions  results  in  vectors  Pp-  •  -  .pj^  which  we  assemble  into  a  k^N  matrix  P 

=  (Pp"*iPfj)-  Also  consider  a  set  of  k  ordered  probabilities  0<aj<O2< ^ 

T  th. 

denoted  by  a  Let  q  =(qp'  •  •,qj^)  where  qj  is  the  expert’s  opinion  as  to  the  quantile 

of  distribution.  Of  course  qj<q 

can  also  include  the  often  used  "smallest,  middle,  largest"  elicitation  taking  these  as,  for 

instance,  the  .005,  .5,  .995  quantiles  respectively.  Again  with  a  collection  of  N  experts  we 

would  obtain  vectors  qp-  •  -  jq^j  which  we  assemble  into  a  kKN  matrix  Q  =  (qp-  •  -  .qp^)- 

What  sort  of  prior  information  will  the  supra  Bayesian  provide?  It  is  possible  that 

he/she  may  give  a  fully  specified  distribution  for  6  whose  density  we  will  denote  by  f(^).  It 


2<*  •  •<qjj  with  qi>aQ,  qjj<ajj-  Quantile  specification 


5 


may  be  more  appropriate  to  assume  that  the  supra  Bayesian  offers  the  same  sort  of 
information  that  the  expert  does.  In  the  case  where  a  p  is  supplied  by  the  expert  we 
assume  that  the  supra  Bayesian  also  provides  a  vector  of  probabilities  for  the  sets  L  which 
we  denote  by  p.  In  the  case  where  q  is  supplied  the  supra  Bayesian  provides  a  vector  of 
quantiles  7  for  the  same  collection  of  a’s.  Hence  there  are  four  types  of  likelihoods  which 
we  shall  consider.  At  the  individual  level  these  are:  (i)  L(p|  9),  (ii)  L(p|  9  €  L),  j=l,'  •  •  ,k, 
(iii)  L(q|^)  and  (iv)  L(q(^  6  (7j_p7j)),  At  the  group  level  P  replaces  p,  Q 

replaces  q. 

Note  that  likelihoods  must  be  specified  with  regard  to  events  for  9,  in  particular 
events  whose  probabilities  can  be  computed  using  the  supra  Bayesian’s  prior  opinion.  In 
other  words  L(p|p)  or  L(q|7)  are  not  meaningful.  Moreover  the  Bayesian  synthesis 
updates  the  prior  densities  only  for  these  events.  In  the  case  of  (i)  and  (iii)  the  combined 
opinion  yields  a  fully  specified  posterior  density  of  for  9.  In  case  (ii)  revised  probabilities 
for  the  intervals  result.  In  case  (iv)  revised  probabilities  are  associated  with  the  7^  but 
unfortunately  no  updated  quantiles  can  be  extracted.  Such  quantiles  can  be  gotten  under 
(iii). 

An  alternative  approach  to  handle  the  incomplete  specification  engendered  in  p  or  q 
is  to  assume  that  the  expert’s  opinion  is  modeled  by  a  parametric  exponential  family 
having,  in  the  case  of  p,  a  k— 1  dimensional  parameter  or,  in  the  case  of  q,  a  k  dimensional 
parameter.  Specification  of  p  or  of  q  then  yields  a  system  of  equations  which,  in  principle, 
determines  the  member  of  the  family  i.e.  the  expert’s  distribution.  Our  approach  is 
nonparametric  avoiding  the  selection  of  the  family  as  well  as  possiblo  difficulties  in  solving 
the  system  of  equations. 

One  might  suspect  that,  since  9  is  univariate,  computing  concerns  are  not  an  issue. 
In  fact,  the  likelihoods  introduced  in  Sections  3.2  and  4.2  are  expressed  as  high  dimensional 
integrals  over  "missing  data".  Computation  of  posterior  distributions  under  such 
likelihoods  is  handled  using  the.  Gibbs  sampler  introduced  as  a  Bayesian  computing  tool  in 


6 


Gelfand  and  Smith  (1990).  Details  are  supplied  in  Section  6. 

3.  Likelihoods  for  discrete  probability  specification 

In  this  section  we  focus  on  likelihoods  when  p’s  are  collected  i.e.  the  cases  of  L(p|  0) 
and  L(p|  6  e  !•).  Note  that  appropriate  behavior  for  such  likelihoods  requires  that  when  6 

J 

is  in,  say  I.,  p.  should  tend  to  be  large  but,  as  6  moves  further  away  from  L,  p.  should  tend 

•I  J  J 

to  be  smaller.  In  section  3.1  we  describe  a  natural  first  attempt  at  modeling  these 
likelihoods  using  the  Dirichlet  process.  After  noting  several  criticisms  of  this  version  we 
review  Lindley’s  (1985)  alternative.  In  section  3.2  we  present  a  very  broad  class  of 
likelihoods  arising  from  mixture  distributions. 

3.1.  Dirichlet  process  model;  Lindley’s  model 

Consider  case  (i)  of  section  2,  L(p(^).  The  most  natural  specification  for  L(p)^) 
would  be  to  assume  that  p  is  induced  by  some  underlying  probability  measure  G  on 
(ag,  aj^).  A  well  discussed  mechanism  for  generating  probability  measures  is  the  Dirichlet 
process  (see  Ferguson,  1973  or  Antoniak,  1974).  In  the  present  case  we  need  to  draw  G 
randomly  given  0  so  we  assume  G|  ^  ~  DP(A;  Gg  ^)  where  the  continuous  distribution  Gg,Q 
is  the  mean  of  G  and  A  is  a  precision  parameter.  For  convenience  we  work  with  cdfs 
rather  than  probability  measures  i.e.  Gg  ^y)  is  the  probability  assigned  to  the  interval 
(ag,y).  The  induced  distribution  for  p,  p|  0,  is  a  Dirichlet  distribution,  D(A;  P{0)),  where 
M  =  with  0.(0)  =  (Gg^^aj)  -  Ggyaj_j)).  We  see  that  the  precision 

parameter  A  may  be  interpreted  as  reflecting  (the  supra  Bayesian’s)  confidence  in  the 
expert’s  opinion.  Small  A,  i.e.,  small  precision  implies  that  this  expert’s  p  will  tend  to  be 
less  informative  about  0  than  under  large  A. 

How  shall  we  choose  Gg  Consider  Gg  ^•)  =  Gg(-  -  0).  Then  for  any  fixed  a, 
Gg  ^a)  decreases  in  0.  Hence  P(pj  <  bj  ^)  increases  in  0  while  P(pj^  <  b|  ^)  decreases  in  0 
so  that  the  likelihood  exhibits  appropriate  behavior  for  extreme  intervals.  In  addition 


7 


suppose  we  take  Gg  to  be  a  distribution  which  is  symmetric  about  0  and  assume  that  aj  are 

equal  spaced,  i.e.,  a.  -  a.  ,  is  constant  j  =  2,  Then  for  p.,  j  =  2,-  •  -  jk-l,  P(p.  < 

J  J  J 

b|^)  will  increase  as  0  moves  away  from  mj  =  (aj  +  aj_j)/2  in  either  direction,  again 
appropriate  behavior  for  the  likelihood.  In  considering  choices  for  Gg,  e.g.,  normal,  t, 

logistic,  we  can  assume  that  the  scale  parameter  <t  =  1  or  else  a  and  A  will  not  be 

identifiable  in  p|  Then,  the  more  heavy— tailed  Gg  is  the  more  probability  vdll  tend  to  be 
placed  in  the  extreme  intervals. 

This  likelihood  is  immediately  adapted  to  the  case  (ii),  L(pl  0  e  Ij).  The  conditional 

distribution  p|  ^  €  I-  is  interpreted  as  p  |  ^  =  m-,  i.e.,  D(A,  where 

J  J 

with  =  Gg(a.-m.)  -  Gg(a._,-m.)  with  m.  as  above  and  m,<aj,  mj^>aj^.  (In  fact,  m. 
can  be  an  arbitrary  point  in  (3'j_p  3’j))-  This  likelihood  again  exhibits  proper  behavior. 

The  Dirichlet  process  for  generating  random  distributions  has  simplicity  in  its  favor 
but  may  be  criticized  as  follows.  First,  the  only  G’s  which  can  be  generated  under  this 
process  are  discrete,  an  unappealing  limitation.  Secondly,  with  regard  to  the  induced 
likelihood  for  p,  as  observed  in  discussion  by  Lindley  (1988),  probabilities  for  all  intervals, 
hence  for  adjacent  intervals,  will  be  negatively  correlated,  i.e.,  cov(pj,  Pj^^l  ff)  <  0-  This 
reflects  unsatisfying  behavior  for  the  likelihood  in  that  if,  for  instance,  9  encourages  large  p. 
it  should  also  encourage  large  pj_j  and  Of  course  since  Epj  =  1  at  least  some 


correlations  amongst  the  p-  must  be  negative  regardless  of  how  the  likelihood  is  specified. 
Moreover  k  >  3  if  required  in  order  that  it  be  possible  for  all  adjacent  correlations  to  be 
positive. 

To  attempt  to  alleviate  this  problem  Lindley  (1985)  suggests  converting  the 
probabilities  to  the  log  scale  and  using  multivariate  normal  models.  Most  simply,  if  we  let 
the  be  baseline  logits,  i.e.,  =  log  Pj/Pj^,  i  =  l,-‘*,k-l,  Lindley  assumes  that 

0  {ox  6  ^  Ij)  is  modeled  as  a  multivariate  normal  whence 
covariances  can  be  set  as  desired.  Of  course  the  covariances  induced  for  p  need  not  agree  in 
sign  with  those  of  the  corresponding  L  (Consider  the  case  where  all  covariances  amongst 


8 


the  are  assumed  nonnegative.)  Moreover  the  collection  of  normal  models  for  I  results  in 
a  limited  collection  of  models  for  p|  ^  (additive  logistic  normal  models  in  the  terminology  of 
Aitchison,  1986).  In  this  regard  see  also  Bernardo’s  (1985)  discussion  to  Lindley’s  paper. 
Thus  Lindley’s  formulation  is  not  totally  satisfying.  We  would  be  remiss  however  if  we 
failed  to  articulate  the  advantages  of  Lindley’s  likelihood  —  mathematical  convenience,  ease 
of  interpretation  and  a  linear  pooling  mechanism  on  the  log  scale. 


3.2  Mixture  of  Betas  model 

We  now  consider  another  class  of  likelihoods  for  L(p|^)  and  L(p|^  6  L)  which 

addresses  the  criticism  of  the  previous  section.  We  first  remind  the  reader  that  any 

continuous  density  on  [0,1]  can  be  arbitrarily  well  approximated  by  a  mixture  of  Beta 

densities.  See,  e.g.,  Diaconis  and  Ylvisaker  (1985  p.  136)  for  a  formal  statement  and  proof. 

The  order  of  the  approximation  is  r  ^  where  r  is  the  number  of  terms  in  the  mixture. 

Inversion  by  a  fixed  continuous  c.d.f.  yields  a  random  distribution  G  on  (uq,  aj^).  Other 

mixture  classes  could  be  used  as  well.  For  instance  mixtures  of  gamma  densities  can 

arbitrarily  well  approximate  any  continuous  distribution  on  R**”.  See  Diaconis  and 

Ylvisaker  for  more  general  discussion.  The  important  point  is  the  use  of  mixing. 

r 

Suppose  we  denote  an  r-component  mixture  of  Beta  densities  by  S  w^  Be(u|77j^ 

where  w  =  (wp...,w^)  is  a  vector  on  the  r— dimensional  simplex.  Let  =  (t/^^  7/2^) 
and  7/  =  (T?p...,r7j.).  If  Gq  ^•)  =  Gg('  -  ^)  is  a  specified  c.d.f.  such  as  in  section  3.1  and  if 
U  is  a  drawn  at  random  from  this  mixture,  then  consider  the  random  variable  Y  =  Gq^^ 
(U)  =  6  +  Gq\u).  The  distribution  of  Y,  say  G(  • )  |  w,  77,  0,  has  the  form 

G(.)|w,  P(Y  <  .  |w,  ^,  <?)  =  P(U  <  Gg(.  -  ^)|w,  y).  (1) 

A  vector  p  arises  from  G  as  in  the  previous  section.  From  (1) 

Pj=  P(U  6  [GQ(aj_i-  0),  GQ(aj-  0)]  \  w,  ^)=S  w^hj^  (77^,  0) 


(2) 


9 


where  is  the  area  under  the  Be(u|77j^  7/2^  density  between  Gg(aj_j-  0)  and 

GQ(a-  -  0),  i.e.,  the  difference  between  two  incomplete  Beta  functions.  From  (2) 

p  =  H(^,  0)w  (3) 

when  H(j7,  0)  is  a  k*r  matrix  having  (i,  entry  h.^  (r/^  0).  Hence  p  arises  as  a  linear 

transformation  w.  Note  that  the  columns  of  H(rj,  0)  sum  to  1. 

In  order  to  develop  a  likelihood  L(p,  0)  we  need  to  introduce  a  random  mechanism 

for  p  given  0.  Suppose  we  assume  that  r,  the  "denseness"  measure  of  our  family  of 

r 

mixture  distributions  and  7?  are  specified  but  that  w  is  random,  i.e.,  £  w^  Be(u|77j^ 

7/2^)  is  an  r  component  random  mixture  of  fixed  Beta  densities.  If  w  is  random  then  G 
is  and  thus  p  is  as  well.  The  density  of  w  denoted  by  f  (•)  is  on  the  r- dimensional 


simplex.  The  book  of  Aitchison  (1986)  is  devoted  almost  entirely  to  discussion  of 

distributions  on  simplexes;  in  our  illustration  example  we  chose  f  to  be  a  Dirichlet.  Note 

that  given  r,  for  any  set  of  Be(u|7;j^  ri2^,  vectors  p  arising  under  (3)  do  not  span  the 

simplex.  For  instance  clearly,  min  h.  (77.,  5)  <  p-  <  max  0)  with  further  constraints 

j  U  J  1  j  J  J 

arising  due  to  the  restriction  of  w  to  the  r-dimensional  simplex.  To  remedy  this  problem, 
suppose  we  constrain  or  modify  sampled  opinion  vectors  p  such  that  Pj  >  e  for  all  i.  This  is 
not  much  of  a  restriction;  it  only  requires  that  every  interval  is  assigned  at  least  some 
minimum  probability.  But  then,  for  r  large  enough,  there  will  be  an  interval  for  0  in  which 
L(p>  Practically,  this  means  choosing  r  large  relative  to  k  and  taking  L(p;  ^)  =  0 

when  0  is  too  extreme  to  permit  p  to  be  observed.  Lastly,  how  shall  we  specify  rjl  The 
result  of  Diaconis  and  Ylrisaker  is  not  constructive  in  this  regard.  In  practice,  we  have 
chosen  the  rj^  to  yield  a  set  of  Beta  densities  which  "cover"  [0,  1].  For  a  given  r,  a  flexible 
choice  is  =  ^{c(r+l-i)  +  (l-f)i},  7/2^  =  5(d,  (l-f)(r+l-i)}.  We  suppress  ^  in  the 
subsequent  notation. 

We  now  obtain  L(p;  0).  Since  Epj  =  1,  Ew^  =  1  we  rewrite  (3)  as 

p(l)  =  +  h[^\0) 


(4) 


10 


where  =  (Pp' • =  (wp--*w^_j).  In  (4),  if  we  write  H(^)  = 

(hj(^?)*  •  •h^(^/))  and  let  be  with  the  last  row  omitted,  then  = 

-  hp^(5),  ■••,  If  we  let  Z  =  where  w^^^^  = 

(wj,  •  •  •  then  consider  the  linear  transformation  from  into  (Z,  i.e., 

[p(i) .  hWf  J  =  <') 

where,  if  being  (k-1)  »  (r-k),  being  (k-1)  «  (k-1) 

then 

f^-k  ^ 

—  /i'\  /i'\ 

Thus,  if  =  P^^^  - 

rz  T 

f{Z,  =  VA'V)  ^(1)  )  •  I  |A(9)1-*|  (6) 

n^  1  r  I  1,  0 

where  |A(d)|  =  and  =  f?*  .  (U  ^  ■  FinaUy 

2 


L(p:^)  =  /  f(Z,p(^)|^)dZ  (7) 

•'  C(p;0) 

where  C(p;^)  denotes  the  restriction  of  Z  resulting  from  (5)  given  p  and  6  with 
constrained  to  the  r— dimensional  simplex.  We  are  assuming  that  p  and  6  are  compatible 
i.e.  L(p,^)  >  0  whence  from  (7)  the  set  C(p;0)  has  positive  Lebesgue  measure.  More 
precisely  we  have 

rz  1  rz  ' 

0<A~'(«)  and  ,  A“'(P)  <1 

which  simplify  to 

Z  >  0, 


11 


and 

lr_kZ  -  <  1-  (8) 

We  thus  note  that  restriction  to  C(p,tf)  provides  linear  constraints  on  Z.  The  likelihood  in 
(7)  may  be  viewed  as  a  "missing  data"  or  marginalized  likelihood.  Computation  of  the 
posterior  for  9  under  (7)  with  specification  of  the  supra  Bayesian’s  prior  can  be  carried  out 
using  the  Gibbs  sampler.  Details  are  provided  in  section  6.  As  in  section  3.1,  if'wc  seek 
L(p|  ^  €  !•)  is  interpreted  as  L(p  |  m-). 

V  J 

From  (3)  note  that  P(pj  <  b|0)  —  P(S  w^  hj^  {&)  i  b).  As  9  increases  («) 
decreases  for  each  1.  Thus  the  random  variable  E  w^  h^^  is  greater  than  the  random 
variable  E  (^2^  ®1  ^2  ^^^1  -  increases  in  9.  Similarly  P(Pk  <  b|  0) 

decreases  in  9.  The  likelihood  in  (7)  behaves  appropriately  for  extreme  intervals.  Precise 
description  of  the  behavior  of  P(p.  <  bj  as  a  function  of  9  depends  upon  the  choice  of 

J 

and  d^  and  on  Gg.  For  the  aforementioned  choices  of  c^  and  d^  ,  since  each  h-^  (0)  as  a 

function  of  9  will  necessarily  increase  to  a  unique  maximum  and  then  decrease,  eventually, 

P(p.  <  b|0)  will  increase  as  9  grows  large  or  small,  again  desired  behavior  for  the 
«} 

likelihood. 

Finally,  the  mean  of  p,  E(p|  9)  =  H(^)  E{w)  and  the  covariance  matrix  for  p,  Ep  (^) 
T 

=  H(0)  E  H(^)  where  E  is  the  covariance  matrix  of  w.  Similar  forms  arise  in  the  case 

w 

of  L(p|  9  e  Ij).  Clearly  the  covariance  structure  available  for  p  under  (3)  is  much  richer 
than  under  the  Dirichlet  model. 

4.  Likelihoods  under  quantile  specification 

In  this  section  we  develop  likelihood  functions  for  cases  (iii)  and  (iv)  of  section  2, 
i.e.,  L(q|  9)  and  L(ql  9  €  (7j_p7j))  respectively.  How  should  these  likelihoods  behave  in 
order  to  be  sensible?  They  should  be  such  that,  with  increasing  9,  q^  tends  to  increase  and 
such  that  any  pair  q.,  qj  are  positively  correlated.  In  section  4.1  we  review  West’s  (1988) 


12 


approach  based  upon  the  Dirichlet  process  while  in  section  4.2  we  start  with  random 
mixtures  of  Beta  densities  to  create  these  likelihoods. 


4.1.  West’s  model 

Recall  that,  for  the  set  of  probabilities  0<aj<O2‘ expert  specifies  a 
vector  of  quantiles  q  =  (qp-  •  -qj^).  West  proposes  a  density  for  q  given  6  which  arises  as 
follows.  Suppose  we  think  of  q  as  being  the  quantiles  of  a  random  distribution  G  given  6. 
Suppose  we  again  consider  a  baseline  c.d.f.  Gq  ^•)  =  Gq(-  -  ^)  as  in  section  3.  Let  F  be  a 
distribution  (c.d.f.)  derived  from  a  Dirichlet  process  on  [0,1].  Then  the  random 
distribution,  G  given  0  is  defined  implicitly  through  F~\*)  =  Gq  ^G~^(')).  In 
particular,  if  ir-  is  the  quantile  of  F  i.e.  7r.=F~^(Q!.),  j=l,*  •  -  .k  then  q.  =  G7^fl(x.)  = 
Gr^Tj)  +  6.  Observe  that  the  Dirichlek  process  which  generates  F  need  not  be  specified; 
rather  we  need  only  an  ordered  Dirichlet  distribution  say  D(5)  which  generates  x  = 
(xp*  •  •  ,Trj^).  Then  q  is  obtained  from  x by  inverting  Gq  ^  The  resultant  likelihood  is 

L(,|  6)  a  £  (G„  (9) 


Though  not  mentioned  by  West,  we  can  immediately  modify  (9)  to  a  likelihood  for  q  given 

9  6  (7j_p  7j)  by  replacing  Gq  ^  with  Gq  ^  where  the  nj  are  selected  analogously  to  the  mj 

1 

in  section  3.1. 

West  comments  that  the  likelihood  (9)  behaves  appropriately.  In  fact  since 
P(qj<b(^l)  =  P(GQ\7rj)  +  ^  <  b)  decreases  in  9,  q.  tends  to  increase  in  0.  Moreover, 
cov(qj,qj)  =  cov(Gq^(xj),  Gq^Xj))  >  0  if  and  only  if  cov(7r.,7rj)  >  0.  Since  the  Xj,  Xj  arise 

i  -  j 

from  we  can  calculate  that,  if  =  E  5*  €■  =  £ 

*  ^=1  ^  ^=i  +  l  ^ 


£: 


k+1 
£  . 
/=1 


fk+l  ] 

9 

'k+l 

E  £, 
1=1  ^ 

E  £,+1 
^=1  ^ 

>  0. 


cov(xj,  X.)  = 


13 


4.2.  Mixture  of  Betas  model 

Here  we  parallel  the  development  of  section  3.2.  A  distribution  on  [0,1]  which  is  a 
random  mixture  of  Beta  distributions  is  selected  and  then  inverted  to  a  distribution  on 
(aQ,  aj^).  Then  is  taken  as  the  quantile  of  the  latter  distribution.  In  particular,  if 
U|w  ~  E  w^  Be(u|T/j^  7^2^  and  is  the  quantile  of  U|w,  i=l,2,*  •  •,k,  analogous  to 
section  4.1,  let  q.  =  +  0.  Since  w  is  random  so  is  ^  - 

and  hence  q.  The  q^  are  the  quantiles  of  the  distribution  of  the  variable  Y  =  Gg^^U) 
given  w.  This  random  distribution  is  given  in  (1).  Using  obvious  notation,  we  have 
GgCq-^ljj)  where  GQ(q-01j^)’^  =  (GgCqj-^),*  •  •  ,GQ(qj^-^)). 

What  is  the  likelihood  L(q!0)?  Given  w,  a-  =  J*  w^  Be(u|77^^ 
r  <p- 

-^S^w^  b^^p  where  b^v^j)  =  ^Be(u|  J72pdu,  i=l,2,-  •  ^k.  Hence 

a  —  B(^)w  (10) 

where  B(v»)  is  kxr  with  Analogous  to  (4)  we  rewrite  (10)  as 

a  =  B^^^(y>)  w^^^  +  b^(^)  (11) 

where,  if  B(^)  =  (bj(y7),-  •  •bj.(y>)),  b(^^(^)  =  (bj(y7)  -  bj.(^),-  •  •bj._j(v>)  -  bj.(v>)). 

Again  we  introduce  missing  variables  Z  =  (^p*  consider  the 

transformation  form  to  (Z,^),  i.e., 


=  D(^)w^^^,  w^^^  =  D  \^) 


(12) 


where,  if  b(^^(v»)  =  b(^)(¥>)),  B^^)  being  k*(r-l-k),  B^^^  being  (r-l-k) 

(r-l-k)  then 


X 


14 


Thus,  if  t(v>)  =  a  -  b^(^),  i(Z,(p)  =  f^(D  \<p) 
and  therefore 


f(Z,q|d)  =  yD-l(GQ(q-tflj^)) 


Z 


V^U(Z,y,)| 


k  dGo(q.-<?) 

n  — — 


y»=Go(q-^ik)  j=i  dqj 


(13) 


Finally 

L(q/)  =  /  f(z,q|«)  dz  (14) 

G(q,^)  ^ 

where  C(q,tf)  denotes  the  restriction  of  Z  resulting  from  (12)  given  q  and  B  with 
constrained  to  the  k— dimensional  simplex.  We  are  assuming  that  q  and  B  are  compatible 
i.e.,  L(q,^)  >  0  whence  C(q,  0)  has  positive  Lebesgue  measure.  Analogous  to  (8)  these 
become 


z  >  0,  B(‘)(q  -  W^))  Z  <  b(’H(,.(1j^)  (15) 

and 

Z  -  b|‘H(,  -d(l)lj)  +  B<*)(q  -  M)Z  +  1]^  B2(q  -  Wj)  t(v)  <  1. 

Importantly,  the  restriction  to  C(q,^)  still  provides  linear  constraints  on  Z. 

The  Jacobian  matrix  in  (13),  d^(l)^^2  tp)'  entries  dw.J&Z-^  and  dw-Jd(p-^  which 
appear  to  be  straightforward  to  calculate  using  (12).  The  difficulty  is  that  D~^(^)  can  not 
be  obtained  explicitly  so  that  it  can  not  be  differentiated  appropriately  with  respect  to  p- 

•I 

and  then  evaluated  at  GQ(q  In  section  6  we  show  how  the  calculation  of  J  can  be 


15 


simplified  using  an  implicit  function  theorem. 

As  in  (7),  the  likelihood  in  (14)  may  be  viewed  as  a  "missing  data"  or  marginalized 
likelihood.  Computation  of  the  posterior  for  6  under  (14)  with  specification  of  the  supra 
Bayesian’s  prior  can  be  carried  out  using  the  Gibbs  sampler.  Details  are  provided  in 


section  6.  Modification  for  L(q(  6  €(7j_p7j))  replaces  6  in  (14)  with  nj. 

Turning  to  the  behavior  of  (14),  from  (10),  P(q.<b|^)  =  P(^.<GQ(b-^))  which 

decreases  in  6  so  qj  tends  to  increase  in  6.  Clearly  cov  (q^,  qj)>0  if  and  only  if  cov(^j, 

(p.)>0.  For  a  given  a-,  a-  and  two  w  vectors  say  w,  and  w,j,  let  be  the  quantiles 

arising  from  w, ,  yj^^^arising  from  w«.  Consider 

^  jJ*^  ^ 

For  a.-a-  small,  by  the  continuity  of  the  b^  (^),  the  second  and 

third  terms  will  be  small  and  offsetting.  Thus  sgn(^(^^-  will  tend  to  be  the  same  as 

sgn  (¥>P^-y>P^)  whence  averaging  over  w  will  result  in  cov(^.,V?.)  >  0  for  close  quantiles. 

•} 


5.  Several  experts 

Typically  in  the  elidtation  of  expert  opinion  more  than  one  source  is  tapped. 
Suppose  that  N  opinions  are  collected.  Often  the  supra  Bayesian  will  obtain  independent 
evaluations  of  the  distribution  of  9  whence  we  obtain  the  likelihoods 

L(P|<?)=  D  L(pjl^) 
i=l  * 


L(Q15)=  n  L(q.l(l). 
i=l  ‘ 

Dependence  amongst  opinions  may  arise  in  a  variety  of  ways.  For  instance,  individual 
opinion  may  be  solidted  after  group  deliberation  or  perhaps  the  opinion  of  the  t^^  expert  is 
obtained  after  he/she  has  been  shown  the  opinion  of  the  previous  t-1  experts,  t=2,-  •  •  ,N. 

Is  there  a  convenient  way  to  extend  the  likelihoods  (7)  and  (14)  to  accommodate 


dependence?  The  most  straightforward  solution  is  to  consider  W  =  (wp  W2,  •  •  •  ,Wj^)  to  be 
a  collection  of  N  dependent  vectors  each  on  the  simplex,  having  joint  density  f^(W).  In 


16 


fZ. 

the  case  of  P  we  consider,  for  each  p-,  a  transformation  as  in  (5),  (i\\  =  A(^)wJ 

LpI 

Extending  (6)  in  the  obvious  way  results  in 

•  -Zn-pV^I  "1“  J 

where  C(P,^)  denotes  the  collection  of  restrictions,  Zj  G  C(pj,  0),  i  =  1,2,- ••,N. 

Modification  for  L(P|  0  €  1)  is  obvious.  Using  (3)  if  denotes  the  covariance  between 

w.  and  w.  then  cov(p.,  p.)  =  H(d) 

1  j  1  j  w 

In  the  case  of  Q  we  consider  the  associated  matrix  where  ^  = 

Zi 

Go(qj  -  For  each  we  envision  a  transformation  as  in  (12),  =  D(^)w>  ^ 

Extending  (13)  in  the  obvious  way  results  in 


N 

^^F(ZpqpZ2,q2,’  •  ^).  (^®) 

where  C(Q,<>)  denotes  the  collection  of  restriction,  Zj  G  C(qj,^),  i=l,2,‘  •  -  N.  Modification 
for  L(Q|  ^  6  (7j_^,7j))  is  obvious.  It  is  not  possible  to  explicitly  obtain  cov(qj,  qj). 

Convenient  mechanisms  for  producing  correlated  vectors  Wp  •••,  Wj^  include  (i) 
creating  correlated  multivariate  normal  vectors  and  then  transforming  to  the 

simplex  by  an  inverse  logit  transformation  or  (ii)  creating  correlated  Dirichlet  vectors  by 
using  common  gamma  variables  in  defining  them. 


L(Q|  9)  =  / 


C(Q, 


6.  Computational  Issues 

To  carry  out  the  Bayesian  synthesis  we  first  require  specification  of  the  supra 
Bayesian’s  prior.  If  the  supra  Bayesian  supplies  a  complete  prior  distribution  for  0,  say 
{(0),  the  likelihoodxprior  form  is  either 

L(p|<?)f(<?)  or  L(q|<?)f(<?). 


(19) 


17 


If  the  supra  Bayesian  provides  a  vector  p  of  probabilities  for  the  intervals  L  the  associated 

prior  is  in  fact  a  multinomal  trial,  i.e.,  the  prior  probability  of  the  event  0  e  I.  is  p-. 

J  I 

Similarly,  if  the  supra  Bayesian  provides  a  vector  of  quantiles  7,  the  associated  prior  is 
again  a  multinomial  trial  where  the  probability  of  the  event  9  e  (7-1 ,  7  )  is  a  - a-  , .  Now 
the  likelihoodxprior  form  is  either 

L(p|  0  €  Ij)-Pj  or  L(q|  ^  e  (7j_i,  7j))  ' 

For  (19),  under  the  likelihoods  (7)  and  (14)  respectively  the  posterior  distribution 
for  9  takes  the  not  too  promising  form  of  a  ratio  of  high  dimensional  integrals.  For  (20)  all 
integrals  with  respect  to  Z  remain  but  integration  with  respect  to  0  is  replaced  by 
summation  over  j.  Notice,  however,  that  the  joint  posterior  distribution  of  9  and  Z  given 
p^^^  is  proportional  to  f(Z,  p^^^|^)  •  f(^)  and  we  seek  the  marginal  posterior  of  0  given 
p^^^.  Similarly,  the  joint  posterior  distribution  of  0  and  Z  given  q  is  proportional  to 
f(Z,  q|  (?)  •  f(^)  and  we  seek  the  marginal  posterior  of  0  given  q. 

The  Gibbs  sampler,  introduced  as  a  Bayesian  computing  tool  in  Gelfand  and  Smith 
(1990)  and  Gelfand  et  al  (1990),  is  a  Markov  chain  Monte  Carlo  approach  for  producing 
samples  from  the  joint  posterior  distribution  hence  from  a  desired  marginal  posterior.  In 
the  context  of  (19)  we  would  obtain  j  =  1,*  •  ^m  as  a  sample  from  f(^|p)  or  from  f(tf|q) 
respectively.  Appropriate  summaries  of  this  sample  (kernel  density  estimates,  moments, 
quantiles,  etc.)  provide  estimates  of  desired  features  of  the  posterior  of  0.  In  the  context  of 
(20)  we  obtain  frequencies  with  which  ^  e  Ij  or  5  e  (7j_p  7j)-  These  frequencies  are 
converted  to  proportions  to  update  pj  or  aj-aj_2  respectively.  We  do  not  describe  the 
Gibbs  sampler  here,  referring  the  reader  to  the  aforementioned  papers.  Rather  we  clarify 
the  complete  conditional  densities  which  must  be  sampled  in  order  to  implement  the  Gibbs 
sampler. 

Consider  f(^,Z|p^^^)  a  f(Z,p^^^|  0)  •  {{9).  To  simplify  matters  we  discretize  ^  to  a 
fine  grid.  This  converts  the  interval  of  9’s  compatible  with  p,  i.e.,  {ft  L(p,5)  >0}  to  a 
discrete  set  say  6^.  For  each  ^  €  6^  the  set  C(p,^  has  positive  measure.  The  Gibbs 


18 


sampler  simulates  realizations  of  a  Markov  chain  on  this  (d,Z)  space.  If  r  is  large  enough  or 
the  domain  of  0  is  small  enough,  the  chain  will  be  irreducible  and  convergence  of  the 
sampler  is  assured. 

Consider  sampling  Z.  Suppose,  as  we  do  in  the  example  of  section  7,  that  f_(w)  is 

r  Vi 

taken  to  be  a  Dirichlet  density  i.e.  f„,(w)  ah  w.  .  Then,  from  (6)  f(Z.  |Z  .,  p,  0),  the 

W  i^l  1  11 

complete  conditional  distribution  for  Z.  is  a  product  of  terms  restricted  to  a  cross  sectional 
set  of  Zj  arising  from  C(p,  0)  given  Z_j.  That  is,  if  0,  p  and  all  Z’s  are  specified  save  Zj, 
then  apart  from  Zj  >  0,  (8)  imposes  k+1  constraints  on  Zj.  These  constraints  are  readily 
obtained  and  result  in  restriction  of  Zj  to  a  nonempty  interval.  We  sample  Z-  restricted  to 
this  interval  using  an  approximate  inverse  c.d.f.  method. 

To  sample  0  given  Z  and  p  we  evaluate  f(Z;  |  ^))  •  f(^)  at  Z  and  p  for  each  0  in 
0p.  We  then  draw  0  from  the  resulting  discrete  distribution  on  Bp.  Iterating  the  Gibbs 
sampler  using  m  parallel  replications  results  in  j=l,***m  at  the  t^^ 

iteration.  Convergence  is  assessed  using  suggestions  in  Gelfand  et  al  (1990)  and  the 
resulting  set  0*y  j=l,*  •  *m  is  retained  to  summarize  the  posterior  of  0  given  p.  Developing 
samples  from  the  posterior  {(0,  Z|q)  is  handled  in  the  same  manner  using  (15)  and  (13)  in 
place  of  (8)  and  (6)  respectively.  Implementing  the  Gibbs  sampler  for  (20)  is  also  handled 
similarly.  Since  the  specifications  0  €  I-,  0  e  (7;  i,  7;)  are  treated  by  setting  0=:ni.,  0=n- 

J  J  ^  J  J  J 

respectively  a  very  coarse  grid  for  0  results. 

Implementation  of  the  Gibbs  Sampler  as  described  above  requires  repeated 
evaluation  of  (6)  and  (8)  at  specified  0,  Z  and  p  or  of  (13)  and  (15)  at  specified  0,  Z  and  q. 
For  (6)  and  (8)  only  calculation  of  incomplete  Beta  functions  is  required  to  obtain  H(^) 
hence  H^^^(0),  H^^^(^)  and  H^^^(0).  For  (13)  and  (15)  again  computation  of  B(yj)  only 
requires  calculation  of  incomplete  Beta  functions.  Because  k  is  not  large  computation  of 

should  present  no  problem.  However,  as  r  increases, 
for  any  L,  the  and  /+1®^  columns  of  H(5)  and  of  B(^)  will  become  almost  identical.  To 


19 


avoid  having  and  which  are  essentially  singular,  in  practice  we  define  them  by 
using  roughly  equally  spaced  columns  from  H  and  B  rather  than  using  consecutive  ones. 

Let  us  consider  computation  of  <p)  ^  given  Z  and  (p  =  GQ(q-^j^).  From 

(12)  we  see  that 


_pr-l-k  0 


5wi. 


where  E(^)  is  a  k«k  matrix  such  that  (E(v?))-:  =  — Here  =  (w(^^  where 

dpj  1  ^ 

is  (r-l-k)"!  and  is  k*l.  The  relationship  +  B^^^(v>)Z  -  t(^)  =  0 

suggests  the  use  of  an  implicit  function  theorem  to  calculate  the  (E(^))jj.  In  particular 

m) 

= - m -  (22) 

d<P:  |b(1^)1 


where  B^||  jj(v»)  is  the  matrix  B^^^(y?)  with  the  i^^  column  replaced  by  the  vector  Sj 

where  Sj  is  a  k>l  vector  having  all  zeroes  except  in  the  row  where  the  entry  is 

r 

Sj  ^  w^  Be( v?j  I  r/2^) •  (23) 

In  (23)  for  a  given  Z  and  we  obtain  w(^^  hence  the  w^  while  Be(^j|  7/2^)  is  merely  a 
Beta  density  evaluated  at  (^.).  From  (21)  and  (22)  we  see  that,  regardless  of  r,  in 

J 

evaluating  M^(l)_,^2  ^)l  never  work  with  larger  than  a  k>«k  determinant. 

We  see  that  computation  is  somewhat  easier  with  p  than  with  q.  However  in  our 
experience  the  q’s  are  more  reliably  elicited  than  the  p’s.  Several  of  the  experts  struggled 
to  comfortably  assign  to  the  intervals  probabilities  which  summed  to  1. 


20 


7.  An  illustrative  example 

Our  example  involves  the  development  of  a  prior  for  the  number  of  points  that  the 
participants  in  the  1991  National  Basketball  Association  championship  series  would  score 
in  a  series  game.  The  two  contestants  were  the  Chicago  Bulls  and  the  Los  Angeles  Lakers. 
As  the  supra  Bayesian’s  opinion  for  a  given  team  we  took  the  distribution  of  points  scored 
by  the  team  in  each  of  the  82  regular  season  games  of  the  1990—91  season.  The  data  were 
graciously  supplied  to  us  in  summary  form  by  the  Bull  and  Laker  organizations. 
Histograms  for  these  point  totals  are  provided  for  each  team  in  Figure  1.  Encouraged  by 
individual  normal  plots  of  these  samples  we  took  f(^)  to  be  N(110.22,  192.10)  for  the  Bulls, 
N(106.30,  151.78)  for  the  Lakers  based  upon  X  and  S  for  the  respective  samples.  Table  1 
provides,  for  each  team,  the  proportions  of  these  point  totals  in  the  five  intervals  <90, 
91-100,  101—110,  111-120,  >121.  Table  1  also  provides,  for  each  team,  the  25^^,  50^^  and 
75^^  quantiles  of  these  point  totals.  These  are  the  supra  Bayesians  p  and  7  vectors. 

The  experts  included  faculty,  students  and  basketball  players  (not  mutually 
exclusive  sets).  Their  opinions  were  collected  independently  during  the  two  day  window  in 
May  1991  between  the  end  of  the  semi-final  series  (so  that  the  participants  in  the 
championship  series  were  identified)  and  the  start  of  the  championship  series.  We  have  a 
total  of  N  =  5  experts  and  their  opinions  are  presented  in  Table  2. 

Our  analysis  is  based  exclusively  on  likelihoods  of  form  (7)  (with  modification  for 
the  case  6  €  Ij)  and  (14)  (with  modification  to  the  case  6  e  (7j_ji  7j))  ® 

independent  judges.  For  (7)  we  have  k=5,  mj  =  75+lOj,  j=l,'  •  *,5  and  Gq  =  N(0,  (10)  ) 
suggesting  a  roughly  60  point  range  between  high  and  low  point  total.  We  set  r=10  and 
took  5=1,  «=.002  for  Chicago,  5=1,  c=.002  for  Los  Angeles  with,  for  both  teams,  f^(w)  = 
D(2,2,*  •  *,2).  The  updated  priors  for  6  are  shown  in  the  form  of  histograms  in  figures  2a 
and  2b.  The  updated  probabilities  for  the  intervals  <90,  91-100,  etc.  are  given  in  Table  3 
and  may  be  compared  with  Table  1.  For  (14)  we  have  k=3.  We  chose  the  nj  to 
correspond  to  the  (Zj_j)/8  quantile  of  the  samples  in  Figure  1,  resulting  in  n^  =  95  nj  = 


21 


105  iig  =  113  =  129  for  the  Bulls  with  Uj  =  92  n2  =  103  iig  =  111  =  120  for  the 

Lakers.  We  again  set  r=10  and  took  ^=1,  c=.004  for  Chicago,  (5=1,  e=.004  for  Los 
Angeles  with  again  f^(w)  =  D(2,2,-  •  -  ,2).  The  resulting  updated  priors  for  0  are  given  in 
figures  2c  and  2d.  The  updated  probabilities  associated  with  7^,  72  and  7^  are  .142,  .661, 
and  1  for  the  Bulls  with  .166,  .7  and  .9766  for  the  Lakers.  The  championship  series 
ultimately  went  five  games  with  scores  given  in  Table  4. 

8.  Summary 

We  have  supplied  a  framework  within  which  to  model  and  aggregate  expert  opinion. 
Our  development,  though  more  demanding  than  previous  ones,  is  attractive  in  providing 

(i)  a  likelihood  for  expert  opinion  appropriate  for  the  most  reliably  elicited 
forms  of  opinion 

(ii)  a  rich  family  of  likelihoods  for  such  opinion 

(iii)  immediate  extension  to  modeling  collections  of  expert  opinion 

(iv)  compatibility  with  the  supra  Bayesian’s  knowledge  about  the  process  and 
about  the  experts 

(v)  a  natural  pooling  mechanism  for  synthesizing  all  the  information. 

References 

Aitchison,  J.  (1986).  The  Statistical  Analysis  of  Compositional  Data.  London/New  York: 
Chapman  and  Hall. 

Antoniak,  C.E.  (1974).  Mixtures  of  Dirichlet  processes  with  applications  to  Bayesian 
nonparametric  problems.  Ann.  Statist.  2,  1152-1174. 

Bernardo,  J.  (1985).  Discussion  of  "Reconciliation  of  discrete  probability  distributions". 
In-  Bayesian  Statistics  2,  eds.  J.M.  Bernardo  et.  al.  388—389,  North  Holland, 
Amsterdam. 

Chatterjee,  S.  and  Chatterjee,  S.  (1987).  On  combining  expert  opinions,  American 


22 


Journal  of  Mathematical  and  Management  Sciences,  7,  (3  and  4)  271-295. 

Devroye,  L.  (1988).  Non-Uniform  Random  Variate  Generation,  New  York, 
Springer— Verlag. 

Diaconis,  P.  and  Ylvisaker,  D.  (1985).  Quantifying  prior  opinion.  In:  Bayesian 

Statistics  2,  eds.  J.M.  Bernardo  et.  al.  133—156.  North  Holland,  Amsterdam. 
Ferguson,  T.  (1973).  A  Bayesian  analysis  of  some  nonparametric  problems.  Ann.  Statist. 
1,  209-230. 

French,  S.  (1985).  Group  consensus  probability  distributions:  a  critical  survey.  In: 
Bayesian  Statistics  2,  eds.  J.M.  Bernardo  et.  al.  183—201,  North  Holland, 
Amsterdam. 

Gelfand,  A.  and  Smith,  A.F.M.  (1990).  Sampling  based  approaches  to  calculating  marginal 
densities.  J.  Amer.  Statist.  Assoc.  85,  398^09. 

Gelfand,  A.,  Hills,  S.E.,  Racine,  A.  and  Smith  A.F.M.  Illustration  of  Bayesian  inference  in 
normal  data  models  using  Gibbs  sampling.  J.  Amer.  Statist.  Assoc.  85,  972-985. 
Genest,  C.  and  Schervish,  M.  (1985).  Modelling  expert  judgement  for  Bayesian  updating. 
Ann.  Statist.  13,  1198-1212. 

Genest  C.  and  Zidek,  J.  (1986).  Combining  probability  distributions:  a  critique  and  an 
annotated  bibliography  (with  discussion).  Statistical  Science  1,  114—148. 

Kadane,  J.,  Dickey,  J.,  Winkler,  R.,  Smith,  W.,  and  Peters  S.  (1980).  Interactive 
elicitation  of  opinion  for  a  normal  linear  model.  J.  Amer.  Statist.  Assoc.  75, 
845-854. 

Kahneman,  D.,  Slovic,  P.,  and  Tversky,  A.  {19S2) Judgment  Under  Uncertainty. 

Heuristics  and  Biases.  Cambridge  University  Press,  New  York. 

Keeney,  R.L.,  and  Raiffa,  H.  (1976).  Decisions  with  Multiple  Objectives:  Preferences  and 
Value  Tradeoffs.  New  York:  John  Wiley  &  Sons. 

Bindley,  D.V.  (1985).  Recondliatiun  of  discrete  probability  distributions.  In:  Bayesian 
Statistics,  2,  eds.  J.M.  Bernardo,  et.  al.  375—390,  North  Holland,  Amsterdam. 


23 


West,  M.  (1988).  Combining  expert  opinion.  In:  Bayesian  Statistics,  3,  eds.  J.M. 

Bernardo,  et.  al.  375—390,  North  Holland,  Amsterdam. 

Winkler,  R.  (1968).  The  consensus  of  subjective  probability  distributions.  Management 
Sci.,  15,  61-75. 


24 


Table  1:  Summary  of  point  totals  for  Chicago  and  for  Los  Angeles  for  the  82 

regular  season  games  of  the  1990-91  season. 


<90 

90-100 

Intervals 

101-110 

111-120 

>120 

Quantiles 

1st  2nd  3rd 

Chicago 

.073 

.122 

.378 

.207 

.220 

101.75  108.00 

118.50 

Los  Angeles 

.110 

.195 

.305 

.293 

.097 

99.00  107.00 

115.00 

Chicago 

X  (prior  mean) 

=  110.22 

(prior  variance) 

=  192.10 

Los  Angeles 

X  (prior  mean) 

=  106.30 

(prior  variance) 

=  151.78 

Table  2;  Probability  and  quantile  elicitation  of  experts 


Expert 

Chicago: 

<90 

90-100 

Interva 

101-110 

s 

111-120 

>120 

Quantiles 

1st  2nd  3rd 

1 

.15 

.50 

.25 

.05 

102  108  117 

2 

.394 

.394 

.099 

.007 

94.6  100  105.4 

3 

.05 

.20 

.45 

.20 

.10 

98  104  112 

4 

.10 

.18 

.35 

.30 

.07 

95  105  113 

5 

.05 

.15 

.30 

.30 

.20 

90  105  115 

pmin 

.05 

.15 

.30 

.099 

.006 

qmin  90  100  105.4 

pmax 

.106 

.394 

.50 

.30 

.20 

qmax  102  108  117 

Los  Angeles: 


1 

.05 

.15 

.35 

.05 

1C2  109  117 

2 

.354 

.455 

.016 

.001 

87.6  93.0  98.4 

3 

.05 

.10 

.25 

.15 

103  107  115 

4 

.13 

.28 

.22 

.05 

92  100  110 

5 

.1 

.3 

.3 

.2 

.1 

85  100  no 

pmin 

.05 

.1 

.174 

.016 

.001 

qmin  85  93  98.4 

pmax 

.354 

.455 

.45 

.35 

.15 

qmax  103  109  117 

Chicago 
Los  Angeles 

Chicago 
Los  Angeles 


Table  3:  Updated  probabilities  for  intervals 

Intervals 


<90 

90-100 

101-110 

111-120 

>120 

0 

.21 

.52 

.27 

0 

0 

.27 

.47 

.26 

0 

posterior  mean  =  107.5  posterior  variance  =  22.34 
posterior  mean  =  105.8  posterior  variance  =  18.55 


Table  4:  Scores  for  the  chs’^^ionship  series 


Game 

Chicago 

Los  Angeles 

1 

91 

93 

2 

107 

86 

3 

104 

96 

4 

97 

82 

5 

108 

101 

Chicago  wins  series  4—1 


Updated  priors  for  Q  using  probabilities 


<0 

o  S 


08  09  Ot'  OZ 


001  08  09  Ot^  02 


001  08  09  0^  02 


theta 


Abstract 

Expert  opinion  is  often  sought  with  regard  to  unknowns  in  a  decision— making 
setting.  Our  presumption  is  that  such  opinion  is  elicited  as  an  incomplete  probabilistic 
specification  either  in  the  form  of  probability  assignments  to  fixed  intervals  or  in  the  form 
of  selected  quantiles.  We  present  likelihoods  for  such  specification  which  anse  through 
random  mixtures  of  Beta  distributions.  We  presume  that  a  supra  Bayesian  presides  over 
the  opinion  collection  resulting  in  the  posterior  distribution  as  the  mechanism  for  pooling 
opinion.  The  models  are  applied  to  opinion  collected  regarding  points  per  game  for 
participants  in  the  1991  NBA  championship  basketball  series. 


